在辅助和自动驾驶系统的各种传感器中,即使在不利的天气或照明条件下,汽车雷达也被认为是一种健壮且低成本的解决方案。随着雷达技术的最新发展和开源的注释数据集,带有雷达信号的语义分割变得非常有前途。但是,现有的方法在计算上是昂贵的,或者通过平均将其减少到2D平面,从原始3D雷达信号中丢弃了大量的有价值的信息。在这项工作中,我们引入了Erase-Net,这是一个有效的雷达分割网络,以语义上的原始雷达信号。我们方法的核心是新型的检测到原始雷达信号的段方法。它首先检测每个对象的中心点,然后提取紧凑的雷达信号表示,最后执行语义分割。我们表明,与最新技术(SOTA)技术相比,我们的方法可以在雷达语义分割任务上实现卓越的性能。此外,我们的方法需要减少20倍的计算资源。最后,我们表明所提出的擦除网络可以被40%压缩而不会造成大幅损失,这比SOTA网络大得多,这使其成为实用汽车应用的更有希望的候选人。
translated by 谷歌翻译
现代深度学习(DL)架构使用使用$ \ Texit运行的SGD算法的变体训练训练{手动} $定义的学习率计划,即,在预定义的时期删除了学习率,通常在训练时损失预计会饱和。在本文中,我们开发了一种实现学习率下降$ \ Texit {自动} $的算法。所提出的方法,即我们称为Autodrop,通过观察到模型参数的角速度,即收敛方向的变化的速度,用于固定学习速率最初迅速增加,然后朝向软饱和。在饱和时,优化器减慢,因此角速度饱和度是用于降低学习率的良好指标。在下降之后,角速度“重置”并遵循先前描述的图案 - 它再次增加,直到饱和度。我们表明,我们的方法改善了SOTA培训方法:它加快了对DL模型的培训并导致更好的概括。我们还表明,我们的方法不需要任何额外的额外的覆盖器调整。 AutoDrop进一步实现和计算方式非常简单。最后,我们开发了一个分析我们算法的理论框架,并提供了收敛保证。
translated by 谷歌翻译
本文研究了在连续学习框架中使用分类网络的固定架构培训深度学习模型的优化算法的新设计。训练数据是非平稳的,非平稳性是由一系列不同的任务施加的。我们首先分析了一个仅在隔离的学习任务的深层模型,并在网络参数空间中识别一个区域,其中模型性能接近恢复的最佳。我们提供的经验证据表明该区域类似于沿收敛方向扩展的锥体。我们研究了融合后优化器轨迹的主要方向,并表明沿着一些顶级主要方向旅行可以迅速将参数带到锥体之外,但其余方向并非如此。我们认为,当参数被限制以保持在训练过程中迄今为止遇到的单个任务的相交中,可以缓解持续学习环境中的灾难性遗忘。基于此观察结果,我们介绍了我们的方向约束优化(DCO)方法,在每个任务中,我们引入一个线性自动编码器以近似其相应的顶部禁止主要方向。然后将它们以正规化术语的形式合并到损失函数中,以便在不忘记的情况下学习即将到来的任务。此外,为了随着任务数量的增加而控制内存的增长,我们提出了一种称为压缩DCO(DCO-comp)的算法的内存效率版本,该版本为存储所有自动编码器的固定大小分配了存储器。我们从经验上证明,与其他基于最新正规化的持续学习方法相比,我们的算法表现出色。
translated by 谷歌翻译
解释性对于理解深神经网络(DNN)的内部工作至关重要,并且许多解释方法产生显着图,这些图突出了输入图像的一部分,这些图像对DNN的预测有了最大的影响。在本文中,我们设计了一种后门攻击,该攻击改变了网络为输入图像而改变的显着图,仅带有注入的触发器,而肉眼看不见,同时保持预测准确性。该攻击依赖于将中毒的数据注入训练数据集中。显着性图被合并到用于训练深层模型的目标函数的惩罚项中,其对模型训练的影响基于触发器的存在。我们设计了两种类型的攻击:有针对性的攻击,该攻击可以实施显着性图的特定修改和无靶向攻击的特定攻击,而当原始显着性图的顶部像素的重要性得分大大降低时。我们对针对各种深度学习体系结构的基于梯度和无梯度解释方法进行的后门攻击进行经验评估。我们表明,在部署不信任来源开发的深度学习模型时,我们的攻击构成了严重的安全威胁。最后,在补充中,我们证明了所提出的方法可以在倒置的设置中使用,在这种情况下,只有在存在触发器的情况下才能获得正确的显着性图(键),从而有效地使解释系统仅适用于选定的用户。
translated by 谷歌翻译
我们考虑在培训深度学习模型的通信约束下分布式优化。我们提出了一种新的算法,其参数更新依赖于两个力量:常规渐变步骤,以及当前最佳性能的工人(领导者)决定的纠正方向。我们的方法以多种方式与参数平均方案EASGD不同:(i)我们的客观制定与原始优化问题相比,我们的客观制定不会改变静止点的位置; (ii)我们避免通过将彼此不同局部最小值下降的本地工人拉动的融合减速(即其参数的平均值); (iii)我们的设计更新破坏了对称性的诅咒(被困在对称非凸景观中的透过透过透过次优溶液中的现象); (iv)我们的方法更加沟通高效,因为它仅广播领导者而不是所有工人的参数。我们提供了对所提出的算法的批量版本的理论分析,我们称之为领导者梯度下降(LGD)及其随机变体(LSGD)。最后,我们实现了算法的异步版本,并将其扩展到多领导者设置,我们组成的工人组,每个人都由自己的本地领导者(组中最佳表现者)表示,并使用纠正措施更新每个工作人员方向由两个有吸引力的力量组成:一个到当地,一个到全球领导者(所有工人中最好的表演者)。多引导设置与当前的硬件架构良好对齐,其中形成组的本地工人位于单个计算节点内,不同的组对应于不同的节点。对于培训卷积神经网络,我们经验证明了我们的方法对最先进的基线比较。
translated by 谷歌翻译
This paper proposes a new optimization algorithm called Entropy-SGD for training deep neural networks that is motivated by the local geometry of the energy landscape. Local extrema with low generalization error have a large proportion of almost-zero eigenvalues in the Hessian with very few positive or negative eigenvalues. We leverage upon this observation to construct a local-entropy-based objective function that favors well-generalizable solutions lying in large flat regions of the energy landscape, while avoiding poorly-generalizable solutions located in the sharp valleys. Conceptually, our algorithm resembles two nested loops of SGD where we use Langevin dynamics in the inner loop to compute the gradient of the local entropy before each update of the weights. We show that the new objective has a smoother energy landscape and show improved generalization over SGD using uniform stability, under certain assumptions. Our experiments on convolutional and recurrent networks demonstrate that Entropy-SGD compares favorably to state-of-the-art techniques in terms of generalization error and training time.
translated by 谷歌翻译
We study the connection between the highly non-convex loss function of a simple model of the fully-connected feed-forward neural network and the Hamiltonian of the spherical spin-glass model under the assumptions of: i) variable independence, ii) redundancy in network parametrization, and iii) uniformity. These assumptions enable us to explain the complexity of the fully decoupled neural network through the prism of the results from random matrix theory. We show that for large-size decoupled networks the lowest critical values of the random loss function form a layered structure and they are located in a well-defined band lower-bounded by the global minimum. The number of local minima outside that band diminishes exponentially with the size of the network. We empirically verify that the mathematical model exhibits similar behavior as the computer simulations, despite the presence of high dependencies in real networks. We conjecture that both simulated annealing and SGD converge to the band of low critical points, and that all critical points found there are local minima of high quality measured by the test error. This emphasizes a major difference between large-and small-size networks where for the latter poor quality local minima have nonzero probability of being recovered. Finally, we prove that recovering the global minimum becomes harder as the network size increases and that it is in practice irrelevant as global minimum often leads to overfitting.
translated by 谷歌翻译
Novel topological spin textures, such as magnetic skyrmions, benefit from their inherent stability, acting as the ground state in several magnetic systems. In the current study of atomic monolayer magnetic materials, reasonable initial guesses are still needed to search for those magnetic patterns. This situation underlines the need to develop a more effective way to identify the ground states. To solve this problem, in this work, we propose a genetic-tunneling-driven variance-controlled optimization approach, which combines a local energy minimizer back-end and a metaheuristic global searching front-end. This algorithm is an effective optimization solution for searching for magnetic ground states at extremely low temperatures and is also robust for finding low-energy degenerated states at finite temperatures. We demonstrate here the success of this method in searching for magnetic ground states of 2D monolayer systems with both artificial and calculated interactions from density functional theory. It is also worth noting that the inherent concurrent property of this algorithm can significantly decrease the execution time. In conclusion, our proposed method builds a useful tool for low-dimensional magnetic system energy optimization.
translated by 谷歌翻译
The release of ChatGPT, a language model capable of generating text that appears human-like and authentic, has gained significant attention beyond the research community. We expect that the convincing performance of ChatGPT incentivizes users to apply it to a variety of downstream tasks, including prompting the model to simplify their own medical reports. To investigate this phenomenon, we conducted an exploratory case study. In a questionnaire, we asked 15 radiologists to assess the quality of radiology reports simplified by ChatGPT. Most radiologists agreed that the simplified reports were factually correct, complete, and not potentially harmful to the patient. Nevertheless, instances of incorrect statements, missed key medical findings, and potentially harmful passages were reported. While further studies are needed, the initial insights of this study indicate a great potential in using large language models like ChatGPT to improve patient-centered care in radiology and other medical domains.
translated by 谷歌翻译
Efficient surrogate modelling is a key requirement for uncertainty quantification in data-driven scenarios. In this work, a novel approach of using Sparse Random Features for surrogate modelling in combination with self-supervised dimensionality reduction is described. The method is compared to other methods on synthetic and real data obtained from crashworthiness analyses. The results show a superiority of the here described approach over state of the art surrogate modelling techniques, Polynomial Chaos Expansions and Neural Networks.
translated by 谷歌翻译